32 research outputs found

    Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

    Get PDF
    Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes.Comment: ECCV 202

    Gazedirector: Fully articulated eye gaze redirection in video

    Get PDF
    We present GazeDirector, a new approach for eye gaze redirection that uses model-fitting. Our method first tracks the eyes by fitting a multi-part eye region model to video frames using analysis-by-synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model-derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person-specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model-fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior

    Investigating non-classical correlations between decision fused multi-modal documents

    Get PDF
    Correlation has been widely used to facilitate various information retrieval methods such as query expansion, relevance feedback, document clustering, and multi-modal fusion. Especially, correlation and independence are important issues when fusing different modalities that influence a multi-modal information retrieval process. The basic idea of correlation is that an observable can help predict or enhance another observable. In quantum mechanics, quantum correlation, called entanglement, is a sort of correlation between the observables measured in atomic-size particles when these particles are not necessarily collected in ensembles. In this paper, we examine a multimodal fusion scenario that might be similar to that encountered in physics by firstly measuring two observables (i.e., text-based relevance and image-based relevance) of a multi-modal document without counting on an ensemble of multi-modal documents already labeled in terms of these two variables. Then, we investigate the existence of non-classical correlations between pairs of multi-modal documents. Despite there are some basic differences between entanglement and classical correlation encountered in the macroscopic world, we investigate the existence of this kind of non-classical correlation through the Bell inequality violation. Here, we experimentally test several novel association methods in a small-scale experiment. However, in the current experiment we did not find any violation of the Bell inequality. Finally, we present a series of interesting discussions, which may provide theoretical and empirical insights and inspirations for future development of this direction

    Towards a Taxonomy of Cognitive RPA Components

    Get PDF
    Robotic Process Automation (RPA) is a discipline that is increasingly growing hand in hand with Artificial Intelligence (AI) and Machine Learning enabling the so-called cognitive automation. In such context, the existing RPA platforms that include AI-based solutions clas sify their components, i.e. constituting part of a robot that performs a set of actions, in a way that seems to obey market or business deci sions instead of common-sense rules. To be more precise, components that present similar functionality are identified with different names and grouped in different ways depending on the platform that provides the components. Therefore, the analysis of different cognitive RPA platforms to check their suitability for facing a specific need is typically a time consuming and error-prone task. To overcome this problem and to pro vide users with support in the development of an RPA project, this paper proposes a method for the systematic construction of a taxonomy of cognitive RPA components. Moreover, such a method is applied over components that solve selected real-world use cases from the industry obtaining promising resultsMinisterio de Economía y Competitividad TIN2016-76956-C3-2-RJunta de Andalucía CEI-12-TIC021Centro para el Desarrollo Tecnológico Industrial P011-19/E0

    Learning Tversky Similarity

    Full text link
    In this paper, we advocate Tversky's ratio model as an appropriate basis for computational approaches to semantic similarity, that is, the comparison of objects such as images in a semantically meaningful way. We consider the problem of learning Tversky similarity measures from suitable training data indicating whether two objects tend to be similar or dissimilar. Experimentally, we evaluate our approach to similarity learning on two image datasets, showing that is performs very well compared to existing methods

    Detecting human Activities Based on a multimodal sensor data set using a bidirectional long short-term memory model: a case study

    Get PDF
    Human falls are one of the leading causes of fatal unintentional injuries worldwide. Falls result in a direct financial cost to health systems, and indirectly, to society’s productivity. Unsurprisingly, human fall detection and prevention is a major focus of health research. In this chapter, we present and evaluate several bidirectional long short-term memory (Bi-LSTM) models using a data set provided by the Challenge UP competition. The main goal of this study is to detect 12 human daily activities (six daily human activities, five falls, and one post-fall activity) derived from multi-modal data sources - wearable sensors, ambient sensors, and vision devices. Our proposed Bi-LSTM model leverages data from accelerometer and gyroscope sensors located at the ankle, right pocket, belt, and neck of the subject. We utilize a grid search technique to evaluate variations of the Bi-LSTM model and identify a configuration that presents the best results. The best Bi-LSTM model achieved good results for precision and f1-score, 43.30% and 38.50%, respectivel

    Learning an Appearance-based Gaze Estimator from One Million Synthesised Images

    No full text

    A {3D} Morphable Eye Region Model for Gaze Estimation

    No full text
    corecore